Increasing performance with multiply-add units and wide buses
نویسندگان
چکیده
A balanced increase of memory bandwidth and computational performance is one of the current trends towards high performance microprocessors. This improvement can be attained either by replicating resources such as buses and functional units or by making them more complex. For example, some microprocessors, as the IBM’s POWER2 double the width of the buses between the register file and the first-level data cache in order to get similar results as by doubling the number of buses, but at a lower cost. In a similar way, some microprocessors have multiply-add fused functional units to increase the computation capability, as IBM’s POWER2 and RS6000 processors. In this paper we evaluate the performance and the effects on register pressure of these alternatives. The performance benefits have been evaluated using 1180 kernel loops of the Perfect Club benchmarks, which account for 78% of the total execution time. The results show that both techniques (widening buses and using multiply-add fused functional units) are complementary cost-effective solutions to increase the processor efficiency in numerical applications.
منابع مشابه
ضربکننده و ضربجمعکننده پیمانه 2n+1 برای پردازنده سیگنال دیجیتال
Nowadays, digital signal processors (DSPs) are appropriate choices for real-time image and video processing in embedded multimedia applications not only due to their superior signal processing performance, but also of the high levels of integration and very low-power consumption. Filtering which consists of multiple addition and multiplication operations, is one of the most fundamental operatio...
متن کاملA Novel Multiply-Accumulator Unit Bus Encoding Architecture for Image Processing Applications
In the CMOS circuit power dissipation is a major concern for VLSI functional units. With shrinking feature size, increased frequency and power dissipation on the data bus have become the most important factor compared to other parts of the functional units. One of the most important functional units in any processor is the Multiply-Accumulator unit (MAC). The current work focuses on the develop...
متن کاملImpact on Performance of Fused Multiply-Add Units in Aggressive VLIW Architectures
Loops are the main time consuming part of programs based on floating point computations. The performance of the loops is limited either by recurrences in the computation or by the resources offered by the architecture. Several general-purpose superscalar microprocessors have been implemented with multiply-add fused floating-point units, that reduces the latency of the combined operation and the...
متن کاملParameterized Function Evaluation for FPGAs
This paper presents parameterized module-generators for pipelined function evaluation using lookup tables, adders, shifters and multipliers. We discuss trade-offs involved between (1) full-lookup tables, (2) bipartite (lookup-add) units, (3) lookup-multiply units, and (4) shift-and-add based CORDIC units. For lookup-multiply units we provide equations estimating approximation errors and roundin...
متن کاملFloating-Point Single-Precision Fused Multiplier-adder Unit on FPGA
The fused multiply-add operation improves many calculations and therefore is already available in some generalpurpose processors, like the Itanium. The optimization of units dedicated to execute the multiply-add operation is therefore crucial to achieve optimal performance when running the overlying applications. In this paper, we present a single-precision floating-point fused multiply-add opt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997